20 research outputs found
Addressing the Blind Spots in Spoken Language Processing
This paper explores the critical but often overlooked role of non-verbal
cues, including co-speech gestures and facial expressions, in human
communication and their implications for Natural Language Processing (NLP). We
argue that understanding human communication requires a more holistic approach
that goes beyond textual or spoken words to include non-verbal elements.
Borrowing from advances in sign language processing, we propose the development
of universal automatic gesture segmentation and transcription models to
transcribe these non-verbal cues into textual form. Such a methodology aims to
bridge the blind spots in spoken language understanding, enhancing the scope
and applicability of NLP models. Through motivating examples, we demonstrate
the limitations of relying solely on text-based models. We propose a
computationally efficient and flexible approach for incorporating non-verbal
cues, which can seamlessly integrate with existing NLP pipelines. We conclude
by calling upon the research community to contribute to the development of
universal transcription methods and to validate their effectiveness in
capturing the complexities of real-world, multi-modal interactions.Comment: 5 page
pose-format: Library for Viewing, Augmenting, and Handling .pose Files
Managing and analyzing pose data is a complex task, with challenges ranging
from handling diverse file structures and data types to facilitating effective
data manipulations such as normalization and augmentation. This paper presents
\texttt{pose-format}, a comprehensive toolkit designed to address these
challenges by providing a unified, flexible, and easy-to-use interface. The
library includes a specialized file format that encapsulates various types of
pose data, accommodating multiple individuals and an indefinite number of time
frames, thus proving its utility for both image and video data. Furthermore, it
offers seamless integration with popular numerical libraries such as NumPy,
PyTorch, and TensorFlow, thereby enabling robust machine-learning applications.
Through benchmarking, we demonstrate that our \texttt{.pose} file format offers
vastly superior performance against prevalent formats like OpenPose, with added
advantages like self-contained pose specification. Additionally, the library
includes features for data normalization, augmentation, and easy-to-use
visualization capabilities, both in Python and Browser environments.
\texttt{pose-format} emerges as a one-stop solution, streamlining the
complexities of pose data management and analysis
Ham2Pose: Animating Sign Language Notation into Pose Sequences
Translating spoken languages into Sign languages is necessary for open
communication between the hearing and hearing-impaired communities. To achieve
this goal, we propose the first method for animating a text written in
HamNoSys, a lexical Sign language notation, into signed pose sequences. As
HamNoSys is universal, our proposed method offers a generic solution invariant
to the target Sign language. Our method gradually generates pose predictions
using transformer encoders that create meaningful representations of the text
and poses while considering their spatial and temporal information. We use weak
supervision for the training process and show that our method succeeds in
learning from partial and inaccurate data. Additionally, we offer a new
distance measurement for pose sequences, normalized Dynamic Time Warping
(nDTW), based on DTW over normalized keypoints trajectories, and validate its
correctness using AUTSL, a large-scale Sign language dataset. We show that it
measures the distance between pose sequences more accurately than existing
measurements and use it to assess the quality of our generated pose sequences.
Code for the data pre-processing, the model, and the distance measurement is
publicly released for future research
Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting
This paper presents work on novel machine translation (MT) systems between spoken and signed languages, where signed languages are represented in SignWriting, a sign language writing system. Our work seeks to address the lack of out-of-the-box support for signed languages in current MT systems and is based on the SignBank dataset, which contains pairs of spoken language text and SignWriting content. We introduce novel methods to parse, factorize, decode, and evaluate SignWriting, leveraging ideas from neural factored MT. In a bilingual setup—-translating from American Sign Language to (American) English—-our method achieves over 30 BLEU, while in two multilingual setups—-translating in both directions between spoken languages and signed languages—-we achieve over 20 BLEU. We find that common MT techniques used to improve spoken language translation similarly affect the performance of sign language translation. These findings validate our use of an intermediate text representation for signed languages to include them in natural language processing research
Considerations for meaningful sign language machine translation based on glosses
Automatic sign language processing is gaining popularity in Natural Language Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in particular, sign language translation based on glosses is a prominent approach. In this paper, we review recent works on neural gloss translation. We find that limitations of glosses in general and limitations of specific datasets are not discussed in a transparent manner and that there is no common standard for evaluation. To address these issues, we put forward concrete recommendations for future research on gloss translation. Our suggestions advocate awareness of the inherent limitations of gloss-based approaches, realistic datasets, stronger baselines and convincing evaluation
Linguistically Motivated Sign Language Segmentation
Sign language segmentation is a crucial task in sign language processing
systems. It enables downstream tasks such as sign recognition, transcription,
and machine translation. In this work, we consider two kinds of segmentation:
segmentation into individual signs and segmentation into phrases, larger units
comprising several signs. We propose a novel approach to jointly model these
two tasks.
Our method is motivated by linguistic cues observed in sign language corpora.
We replace the predominant IO tagging scheme with BIO tagging to account for
continuous signing. Given that prosody plays a significant role in phrase
boundaries, we explore the use of optical flow features. We also provide an
extensive analysis of hand shapes and 3D hand normalization.
We find that introducing BIO tagging is necessary to model sign boundaries.
Explicitly encoding prosody by optical flow improves segmentation in shallow
models, but its contribution is negligible in deeper models. Careful tuning of
the decoding algorithm atop the models further improves the segmentation
quality.
We demonstrate that our final models generalize to out-of-domain video
content in a different signed language, even under a zero-shot setting. We
observe that including optical flow and 3D hand normalization enhances the
robustness of the model in this context.Comment: Accepted at EMNLP 2023 (Findings
Considerations for meaningful sign language machine translation based on glosses
Automatic sign language processing is gaining popularity in Natural Language
Processing (NLP) research (Yin et al., 2021). In machine translation (MT) in
particular, sign language translation based on glosses is a prominent approach.
In this paper, we review recent works on neural gloss translation. We find that
limitations of glosses in general and limitations of specific datasets are not
discussed in a transparent manner and that there is no common standard for
evaluation.
To address these issues, we put forward concrete recommendations for future
research on gloss translation. Our suggestions advocate awareness of the
inherent limitations of gloss-based approaches, realistic datasets, stronger
baselines and convincing evaluation